Skip to content

fix(code): reliable cloud message-queue dispatch + optimistic prompt UX#1905

Open
VojtechBartos wants to merge 5 commits intomainfrom
posthog-code/prevent-message-send-during-cloud-run-startup
Open

fix(code): reliable cloud message-queue dispatch + optimistic prompt UX#1905
VojtechBartos wants to merge 5 commits intomainfrom
posthog-code/prevent-message-send-during-cloud-run-startup

Conversation

@VojtechBartos
Copy link
Copy Markdown
Member

@VojtechBartos VojtechBartos commented Apr 28, 2026

Problem

Cloud follow-up messages typed during sandbox setup, death, or restore could be silently lost — a generic error toast and the prompt vanished. Users also had no visual anchor for the prompt they typed at task creation until the agent eventually echoed it back via SSE (many seconds while the sandbox provisions).

The deeper framing: local runs already had a working version of all of this. This PR closes the parity gap.

Local–cloud parity at a glance

Capability Local (existing) Cloud (before) Cloud (this PR)
"Agent is ready" signal agent.reconnect.mutate() resolving sets session.status: "connected" None (heuristics on cloudStatus + isPromptPending) _posthog/run_started flips session.status: "connected"
Queue gate when agent isn't ready sendLocalPrompt checks session.status upfront, throws if not connected Inferred from cloudStatus / isPromptPending / optimistic-pending heuristic — racy sendCloudPrompt queues if session.status !== "connected" (same shape as local)
Drain trigger Local doesn't queue across turns the same way (synchronous mutate) Multiple racy auto-flushes Single trigger: _posthog/turn_complete
Re-enqueue on failure N/A (local mutate throws synchronously) Old retry-loop dropped the prompt on max-retries prependQueuedMessages rolls the queue back
Optimistic user bubble appendOptimisticItem + replaceOptimisticWithEvent swap in place None Seeded in hydrateCloudTaskSessionFromLogs, pinned at top, content-deduped against echo

Changes

  1. chore(agent): emit _posthog/run_started — the long-declared lifecycle handshake, finally wired up. Persisted to the log so warm reconnects replay it.
  2. fix(code): reliable queue dispatch — peek-and-confirm dispatcher, single turn_complete drain trigger, session.status === "connected" queue gate, plus an immer-proxy fix in dequeueMessages that was the silent root cause behind "queue clears, message disappears".
  3. feat(code): pin prompt at top during sandbox setup — optimistic bubble seeded only when there's no prior history, pinned above the conversation for cloud, content-deduped against the agent's echo. Local sessions untouched.

Showcase

code-queue.mov

Test plan

  • Initial cloud prompt visible immediately during sandbox boot
  • Follow-ups typed during boot / mid-turn / sandbox restore queue and dispatch correctly
  • Reopened task with prior conversation: no flash, no duplicate bubbles
  • Sandbox death → resume → queued messages dispatch after the resume turn completes
  • Local sessions unaffected
  • Typecheck, lint, all tests pass
  • Manually tested end-to-end

Created with PostHog Code

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 28, 2026

Prompt To Fix All With AI
This is a comment left during a code review.
Path: apps/code/src/renderer/features/sessions/service/service.test.ts
Line: 834-890

Comment:
**Real-time path not covered by the new test**

The new test exercises only the log-replay path (via `mockConvertStoredEntriesToEvents`). The real-time path — where `handleSessionEvent` receives the `run_started` `AcpMessage` directly and calls `updatePromptStateFromEvents` — is not tested. If the handler in `updatePromptStateFromEvents` were accidentally removed or broken, no test would catch it.

A second `it` case (or parameterised variant) that calls `handleSessionEvent` with a crafted `run_started` `AcpMessage` and asserts `updateSession` is called with `{ status: "connected" }` would close this gap.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: apps/code/src/renderer/features/sessions/components/SessionView.tsx
Line: 602-613

Comment:
**Tooltip precedence when both `handoffInProgress` and `!isAgentReady` are true**

When `handoffInProgress` is `true` and `isAgentReady` is also `false`, `submitDisabledExternal` will be `true` but the tooltip will say "Waiting for agent to be ready…" rather than a handoff-related message. If `PromptInput` shows the `submitTooltipOverride` unconditionally when non-undefined, the user sees a potentially misleading message. Worth verifying whether these two conditions are mutually exclusive in practice, or if handoff tooltip priority should be explicit.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "chore(code): gate Send button on agent r..." | Re-trigger Greptile

Comment thread apps/code/src/renderer/features/sessions/service/service.test.ts Outdated
Comment thread apps/code/src/renderer/features/sessions/components/SessionView.tsx
@VojtechBartos VojtechBartos marked this pull request as draft April 28, 2026 12:43
@VojtechBartos VojtechBartos force-pushed the posthog-code/prevent-message-send-during-cloud-run-startup branch from a5ac890 to 76e128f Compare April 29, 2026 07:29
@VojtechBartos VojtechBartos changed the title chore(code): gate Send during cloud agent init fix(code): reliable cloud message-queue dispatch + optimistic prompt UX Apr 29, 2026
@VojtechBartos VojtechBartos self-assigned this Apr 29, 2026
@VojtechBartos VojtechBartos requested a review from a team April 29, 2026 08:45
@VojtechBartos VojtechBartos marked this pull request as ready for review April 29, 2026 08:46
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 29, 2026

Prompt To Fix All With AI
This is a comment left during a code review.
Path: apps/code/src/renderer/features/sessions/service/service.test.ts
Line: 469-474

Comment:
**Fragile timing-based negative assertion**

This test waits a hard-coded 100 ms and then asserts `updateSession` was *never* called. If the async hydration callback takes longer (slow CI, high load), the assertion races past before the call happens and the test passes for the wrong reason.

A more reliable pattern is to wait on a positive side-effect that proves hydration completed (e.g. `vi.waitFor(() => expect(mockTrpcLogs.writeLocalLogs.mutate).toHaveBeenCalled())`) and only then check the negative:

```ts
// Wait for hydration to finish via a side-effect we know runs after the
// run_started handler (e.g. the log-write that happens unconditionally).
await vi.waitFor(() =>
  expect(mockTrpcLogs.writeLocalLogs.mutate).toHaveBeenCalled(),
);
expect(mockSessionStoreSetters.updateSession).not.toHaveBeenCalledWith(
  "run-123",
  { status: "connected" },
);
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: apps/code/src/renderer/features/sessions/components/mergeConversationItems.ts
Line: 193-199

Comment:
**Content-based dedup can suppress a legitimate follow-up with the same text**

While `optimisticItems` is non-empty, *every* `user_message` in `conversationItems` whose `content` matches the optimistic set is filtered — not just the first echo. If the user sends a second message with the exact same wording as the seeded optimistic item (e.g. task description "build me a thing" submitted again), its echo would also be silently removed from the view.

This window is narrow (only while the initial optimistic item persists), so it mainly matters if `clearOptimisticItems` / `replaceOptimisticWithEvent` is not called promptly once the first `session/prompt` echo lands. Worth verifying the cloud session has a reliable path to drain `optimisticItems` after the first echo is confirmed.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "fix(code): seed cloud optimistic when on..." | Re-trigger Greptile

Wire up the long-declared `_posthog/run_started` notification after the
ACP session is fully initialized. Documented in `acp-extensions.ts` as the
canonical "agent is up and accepting user messages" handshake. Persisted
to the session log so warm reconnects (sandbox restart with snapshot
resume) replay it. Adapter-agnostic — emitted at the server layer rather
than per-adapter.

Generated-By: PostHog Code
Task-Id: 8228d7eb-50f0-4148-bbc3-d47617e982f7
Cloud follow-up messages typed during sandbox setup, sandbox death, or
restore could be silently lost: the dispatcher's old retry-loop drained
the queue up-front, held the prompt in a local var, and on retry
exhaustion surfaced a generic toast without re-enqueuing. Multiple
auto-flush triggers also raced with the agent's initial `prompt()` call,
producing `stopReason: cancelled` on otherwise-good user messages.

- Dispatcher refactored to peek-and-confirm: drain → send → re-prepend on
  failure. Per-taskId re-entrance guard (`dispatchingCloudQueues`)
  prevents two concurrent triggers from double-dispatching.
- `_posthog/run_started` flips `session.status` to `"connected"` (the
  explicit agent-ready handshake from #1).
- `_posthog/turn_complete` is the only queue-drain trigger now — it's
  the safe boundary, fires when the agent has actually finished a turn.
- `sendCloudPrompt` queues if `status !== "connected"` — covers the
  initial-boot window and sandbox restart/restore window with one signal
  instead of the previous optimistic-pending heuristic.
- Removed the cloudStatus="in_progress" and post-log `!isPromptPending`
  auto-flushes; both raced with `sendInitialTaskMessage`.
- `dequeueMessages` now reads the frozen committed state before entering
  the immer draft. Drained items used to be revoked proxies that crashed
  `combineQueuedCloudPrompts` once the setState callback exited — the
  silent root cause of "queue clears, message lost".
- `prependQueuedMessages` setter rolls the queue back when a dispatch
  fails so the next trigger retries.

Generated-By: PostHog Code
Task-Id: 8228d7eb-50f0-4148-bbc3-d47617e982f7
Render the optimistic user-message bubble immediately on cloud task
creation instead of waiting for the agent to echo it back via SSE
(which can be many seconds while the sandbox provisions, clones, and
boots). The optimistic seed itself is plumbed in #2 — this commit makes
it visible in the right place.

- ConversationView pins optimistic items above conversationItems for
  cloud sessions and content-dedups the agent's eventual echo so the
  bubble doesn't disappear-then-reappear when the real `session/prompt`
  event lands. Local sessions are unchanged: optimistic stays at
  chronological end and `replaceOptimisticWithEvent` swaps in place.
- `useSessionConnection` plumbs `task.description` into `watchCloudTask`
  so the seed has content.

Generated-By: PostHog Code
Task-Id: 8228d7eb-50f0-4148-bbc3-d47617e982f7
Extracts the cloud-vs-local item merge logic from ConversationView into a
pure helper so it can be unit-tested without rendering. Adds coverage for
the run_started lifecycle notification, queue dispatch reliability
(immer-proxy revocation, rollback on failure, status-gated send), and
hydrate-time optimistic seeding.

Generated-By: PostHog Code
Task-Id: 8228d7eb-50f0-4148-bbc3-d47617e982f7
The hydrate-time seed only fired when persisted log entries were literally
empty. By the time the hydrate fetch resolves, the agent has usually
already emitted `_posthog/run_started` and setup-progress notifications,
so a brand-new task with no user prompt yet would skip seeding and the
task description would no longer be pinned at the top.

Switch the gate to "no `session/prompt` request in events" — covers both
the empty-log case and the lifecycle-notifications-only case.

Generated-By: PostHog Code
Task-Id: 8228d7eb-50f0-4148-bbc3-d47617e982f7
@VojtechBartos VojtechBartos force-pushed the posthog-code/prevent-message-send-during-cloud-run-startup branch from 05c8979 to d794969 Compare April 29, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant